79 research outputs found
UV-GAN: Adversarial Facial UV Map Completion for Pose-invariant Face Recognition
Recently proposed robust 3D face alignment methods establish either dense or
sparse correspondence between a 3D face model and a 2D facial image. The use of
these methods presents new challenges as well as opportunities for facial
texture analysis. In particular, by sampling the image using the fitted model,
a facial UV can be created. Unfortunately, due to self-occlusion, such a UV map
is always incomplete. In this paper, we propose a framework for training Deep
Convolutional Neural Network (DCNN) to complete the facial UV map extracted
from in-the-wild images. To this end, we first gather complete UV maps by
fitting a 3D Morphable Model (3DMM) to various multiview image and video
datasets, as well as leveraging on a new 3D dataset with over 3,000 identities.
Second, we devise a meticulously designed architecture that combines local and
global adversarial DCNNs to learn an identity-preserving facial UV completion
model. We demonstrate that by attaching the completed UV to the fitted mesh and
generating instances of arbitrary poses, we can increase pose variations for
training deep face recognition/verification models, and minimise pose
discrepancy during testing, which lead to better performance. Experiments on
both controlled and in-the-wild UV datasets prove the effectiveness of our
adversarial UV completion model. We achieve state-of-the-art verification
accuracy, , under the CFP frontal-profile protocol only by combining
pose augmentation during training and pose discrepancy reduction during
testing. We will release the first in-the-wild UV dataset (we refer as WildUV)
that comprises of complete facial UV maps from 1,892 identities for research
purposes
Domain-General Crowd Counting in Unseen Scenarios
Domain shift across crowd data severely hinders crowd counting models to
generalize to unseen scenarios. Although domain adaptive crowd counting
approaches close this gap to a certain extent, they are still dependent on the
target domain data to adapt (e.g. finetune) their models to the specific
domain. In this paper, we aim to train a model based on a single source domain
which can generalize well on any unseen domain. This falls into the realm of
domain generalization that remains unexplored in crowd counting. We first
introduce a dynamic sub-domain division scheme which divides the source domain
into multiple sub-domains such that we can initiate a meta-learning framework
for domain generalization. The sub-domain division is dynamically refined
during the meta-learning. Next, in order to disentangle domain-invariant
information from domain-specific information in image features, we design the
domain-invariant and -specific crowd memory modules to re-encode image
features. Two types of losses, i.e. feature reconstruction and orthogonal
losses, are devised to enable this disentanglement. Extensive experiments on
several standard crowd counting benchmarks i.e. SHA, SHB, QNRF, and NWPU, show
the strong generalizability of our method.Comment: Accepted to AAAI 2023 as Oral Presentatio
Confidence-guided Centroids for Unsupervised Person Re-Identification
Unsupervised person re-identification (ReID) aims to train a feature
extractor for identity retrieval without exploiting identity labels. Due to the
blind trust in imperfect clustering results, the learning is inevitably misled
by unreliable pseudo labels. Albeit the pseudo label refinement has been
investigated by previous works, they generally leverage auxiliary information
such as camera IDs and body part predictions. This work explores the internal
characteristics of clusters to refine pseudo labels. To this end,
Confidence-Guided Centroids (CGC) are proposed to provide reliable cluster-wise
prototypes for feature learning. Since samples with high confidence are
exclusively involved in the formation of centroids, the identity information of
low-confidence samples, i.e., boundary samples, are NOT likely to contribute to
the corresponding centroid. Given the new centroids, current learning scheme,
where samples are enforced to learn from their assigned centroids solely, is
unwise. To remedy the situation, we propose to use Confidence-Guided pseudo
Label (CGL), which enables samples to approach not only the originally assigned
centroid but other centroids that are potentially embedded with their identity
information. Empowered by confidence-guided centroids and labels, our method
yields comparable performance with, or even outperforms, state-of-the-art
pseudo label refinement works that largely leverage auxiliary information
Redesigning Multi-Scale Neural Network for Crowd Counting
Perspective distortions and crowd variations make crowd counting a
challenging task in computer vision. To tackle it, many previous works have
used multi-scale architecture in deep neural networks (DNNs). Multi-scale
branches can be either directly merged (e.g. by concatenation) or merged
through the guidance of proxies (e.g. attentions) in the DNNs. Despite their
prevalence, these combination methods are not sophisticated enough to deal with
the per-pixel performance discrepancy over multi-scale density maps. In this
work, we redesign the multi-scale neural network by introducing a hierarchical
mixture of density experts, which hierarchically merges multi-scale density
maps for crowd counting. Within the hierarchical structure, an expert
competition and collaboration scheme is presented to encourage contributions
from all scales; pixel-wise soft gating nets are introduced to provide
pixel-wise soft weights for scale combinations in different hierarchies. The
network is optimized using both the crowd density map and the local counting
map, where the latter is obtained by local integration on the former.
Optimizing both can be problematic because of their potential conflicts. We
introduce a new relative local counting loss based on relative count
differences among hard-predicted local regions in an image, which proves to be
complementary to the conventional absolute error loss on the density map.
Experiments show that our method achieves the state-of-the-art performance on
five public datasets, i.e. ShanghaiTech, UCF_CC_50, JHU-CROWD++, NWPU-Crowd and
Trancos.Comment: IEEE Transactions on Image Processin
- …